Search CORE

167 research outputs found

Transforming Time Series for Efficient and Accurate Classification

Author: Li Daoyuan
Publication venue: University of Luxembourg, Luxembourg, Luxembourg
Publication date: 11/01/2018
Field of study

Time series data refer to sequences of data that are ordered either temporally, spatially or in another defined order. They can be frequently found in a variety of domains, including financial data analysis, medical and health monitoring and industrial automation applications. Due to their abundance and wide application scenarios, there has been an increasing need for efficient machine learning algorithms to extract information and build knowledge from these data. One of the major tasks in time series mining is time series classification (TSC), which consists of applying a learning algorithm on labeled data to train a model that will then be used to predict the classes of samples from an unlabeled data set. Due to the sequential characteristic of time series data, state-of-the-art classification algorithms (such as SVM and Random Forest) that performs well for generic data are usually not suitable for TSC. In order to improve the performance of TSC tasks, this dissertation proposes different methods to transform time series data for a better feature extraction process as well as novel algorithms to achieve better classification performance in terms of computation efficiency and classification accuracy. In the first part of this dissertation, we conduct a large scale empirical study that takes advantage of discrete wavelet transform (DWT) for time series dimensionality reduction. We first transform real-valued time series data using different families of DWT. Then we apply dynamic time warping (DTW)-based 1NN classification on 39 datasets and find out that existing DWT-based lossy compression approaches can help to overcome the challenges of storage and computation time. Furthermore, we provide assurances to practitioners by empirically showing, with various datasets and with several DWT approaches, that TSC algorithms yield similar accuracy on both compressed (i.e., approximated) and raw time series data. We also show that, in some datasets, wavelets may actually help in reducing noisy variations which deteriorate the performance of TSC tasks. In a few cases, we note that the residual details/noises from compression are more useful for recognizing data patterns. In the second part, we propose a language model-based approach for TSC named Domain Series Corpus (DSCo), in order to take advantage of mature techniques from both time series mining and Natural Language Processing (NLP) communities. After transforming real-valued time series into texts using Symbolic Aggregate approXimation (SAX), we build per-class language models (unigrams and bigrams) from these symbolized text corpora. To classify unlabeled samples, we compute the fitness of each symbolized sample against all per-class models and choose the class represented by the model with the best fitness score. Through extensive experiments on an open dataset archive, we demonstrate that DSCo performs similarly to approaches working with original uncompressed numeric data. We further propose DSCo-NG to improve the computation efficiency and classification accuracy of DSCo. In contrast to DSCo where we try to find the best way to recursively segment time series, DSCo-NG breaks time series into smaller segments of the same size, this simplification also leads to simplified language model inference in the training phase and slightly higher classification accuracy. The third part of this dissertation presents a multiscale visibility graph representation for time series as well as feature extraction methods for TSC, so that both global and local features are fully extracted from time series data. Unlike traditional TSC approaches that seek to find global similarities in time series databases (e.g., 1NN-DTW) or methods specializing in locating local patterns/subsequences (e.g., shapelets), we extract solely statistical features from graphs that are generated from time series. Specifically, we augment time series by means of their multiscale approximations, which are further transformed into a set of visibility graphs. After extracting probability distributions of small motifs, density, assortativity, etc., these features are used for building highly accurate classification models using generic classifiers (e.g., Support Vector Machine and eXtreme Gradient Boosting). Based on extensive experiments on a large number of open datasets and comparison with five state-of-the-art TSC algorithms, our approach is shown to be both accurate and efficient: it is more accurate than Learning Shapelets and at the same time faster than Fast Shapelets. Finally, we list a few industrial applications that relevant to our research work, including Non-Intrusive Load Monitoring as well as anomaly detection and visualization by means for hierarchical clustering for time series data. In summary, this dissertation explores different possibilities to improve the efficiency and accuracy of TSC algorithms. To that end, we employ a range of techniques including wavelet transforms, symbolic approximations, language models and graph mining algorithms. We experiment and evaluate our approaches using publicly available time series datasets. Comparison with the state-of-the-art shows that the approaches developed in this dissertation perform well, and contribute to advance the field of TSC

Open Repository and Bibliography - Luxembourg

Slab control on the mega-sized North Pacific ultra-low velocity zone.

Author: Bower Daniel J.
Li Jiewen
Sun Daoyuan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 24/02/2022
Field of study

Ultra-low velocity zones (ULVZs) are localized small-scale patches with extreme physical properties at the core-mantle boundary that often gather at the margins of Large Low Velocity Provinces (LLVPs). Recent studies have discovered several mega-sized ULVZs with a lateral dimension of ~900 km. However, the detailed structures and physical properties of these ULVZs and their relationship to LLVP edges are not well constrained and their formation mechanisms are poorly understood. Here, we break the degeneracy between the size and velocity perturbation of a ULVZ using two orthogonal seismic ray paths, and thereby discover a mega-sized ULVZ at the northern edge of the Pacific LLVP. The ULVZ is almost double the size of a previously imaged ULVZ in this region, but with half of the shear velocity reduction. This mega-sized ULVZ has accumulated due to stable mantle flow converging at the LLVP edge driven by slab-debris in the lower mantle. Such flow also develops the subvertical north-tilting edge of the Pacific LLVP

PubMed Central

Bern Open Repository and Information System (BORIS)

Efficient Personalized Federated Learning via Sparse Model-Adaptation

Author: Chen Daoyuan
Ding Bolin
Gao Dawei
Li Yaliang
Yao Liuyi
Publication venue
Publication date: 09/06/2023
Field of study

Federated Learning (FL) aims to train machine learning models for multiple clients without sharing their own private data. Due to the heterogeneity of clients' local data distribution, recent studies explore the personalized FL that learns and deploys distinct local models with the help of auxiliary global models. However, the clients can be heterogeneous in terms of not only local data distribution, but also their computation and communication resources. The capacity and efficiency of personalized models are restricted by the lowest-resource clients, leading to sub-optimal performance and limited practicality of personalized FL. To overcome these challenges, we propose a novel approach named pFedGate for efficient personalized FL by adaptively and efficiently learning sparse local models. With a lightweight trainable gating layer, pFedGate enables clients to reach their full potential in model capacity by generating different sparse models accounting for both the heterogeneous data distributions and resource constraints. Meanwhile, the computation and communication efficiency are both improved thanks to the adaptability between the model sparsity and clients' resources. Further, we theoretically show that the proposed pFedGate has superior complexity with guaranteed convergence and generalization error. Extensive experiments show that pFedGate achieves superior global accuracy, individual accuracy and efficiency simultaneously over state-of-the-art methods. We also demonstrate that pFedGate performs better than competitors in the novel clients participation and partial clients participation scenarios, and can learn meaningful sparse local models adapted to different data distributions.Comment: Accepted to ICML 202

arXiv.org e-Print Archive

Understanding Android App Piggybacking:A Systematic Study of Malicious Code Grafting

Author: Bissyande Tegawendé François D Assise
Cavallaro Lorenzo
Klein Jacques
Le Traon Yves
Li Daoyuan
Li Li
Lo David
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

The Android packaging model offers ample opportunities for malware writers to piggyback malicious code in popular apps, which can then be easily spread to a large user base. Although recent research has produced approaches and tools to identify piggybacked apps, the literature lacks a comprehensive investigation into such phenomenon. We fill this gap by 1) systematically building a large set of piggybacked and benign apps pairs, which we release to the community, 2) empirically studying the characteristics of malicious piggybacked apps in comparison with their benign counterparts, and 3) providing insights on piggybacking processes. Among several findings providing insights, analysis techniques should build upon to improve the overall detection and classification accuracy of piggybacked apps, we show that piggybacking operations not only concern app code but also extensively manipulates app resource files, largely contradicting common beliefs. We also find that piggybacking is done with little sophistication, in many cases automatically, and often via library code

Crossref

Royal Holloway - Pure

Institutional Knowledge at Singapore Management University

King's Research Portal

Open Repository and Bibliography - Luxembourg

Revisiting Personalized Federated Learning: Robustness Against Backdoor Attacks

Author: Chen Daoyuan
Cheng Minhao
Ding Bolin
Li Yaliang
Qin Zeyu
Yao Liuyi
Publication venue
Publication date: 05/06/2023
Field of study

In this work, besides improving prediction accuracy, we study whether personalization could bring robustness benefits to backdoor attacks. We conduct the first study of backdoor attacks in the pFL framework, testing 4 widely used backdoor attacks against 6 pFL methods on benchmark datasets FEMNIST and CIFAR-10, a total of 600 experiments. The study shows that pFL methods with partial model-sharing can significantly boost robustness against backdoor attacks. In contrast, pFL methods with full model-sharing do not show robustness. To analyze the reasons for varying robustness performances, we provide comprehensive ablation studies on different pFL methods. Based on our findings, we further propose a lightweight defense method, Simple-Tuning, which empirically improves defense performance against backdoor attacks. We believe that our work could provide both guidance for pFL application in terms of its robustness and offer valuable insights to design more robust FL methods in the future. We open-source our code to establish the first benchmark for black-box backdoor attacks in pFL: https://github.com/alibaba/FederatedScope/tree/backdoor-bench.Comment: KDD 202

arXiv.org e-Print Archive

The Research of Population Genetic Differentiation for Marine Fishes (Hyporthodus septemfasciatus) Based on Fluorescent AFLP Markers

Author: Li Jun
Liu Jing
Liu Qinghua
Ma Daoyuan
Xiao Yongshuang
Xiao Zhizhong
Publication venue: 'IntechOpen'
Publication date: 05/11/2018
Field of study

Hyporthodus septemfasciatus is a commercially important proliferation fish which is distributed in the coastal waters of Japan, Korea, and China. We used the fluorescent AFLP technique to check the genetic differentiations between broodstock and offspring populations. A total of 422 polymorphic bands (70.10%) were detected from the 602 amplified bands. A total of 308 polymorphic loci were checked for broodstock I (Pbroodstock I = 55.50%) coupled with 356 and 294 for broodstock II (Pbroodstock II = 63.12%) and offspring (Poffspring = 52.88%), respectively. The levels of population genetic diversities for broodstock were higher than those for offspring. Both AMOVA and Fst analyses showed that significant genetic differentiation existed among populations, and limited fishery recruitment to the offspring was detected. STRUCTURE and PCoA analyses indicated that two management units existed and most offspring individuals (95.0%) only originated from 44.0% of the individuals of broodstock I, which may have negative effects on sustainable fry production

IntechOpen

Crossref